atari game
_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)
Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?
Resetting the Optimizer in Deep RL: An Empirical Study
We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
- North America > Canada > Alberta (0.14)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > United States (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Leisure & Entertainment > Games (0.71)
- Education (0.46)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- North America > Canada > Ontario > Toronto (0.04)